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Abstract 

> . 

This note recapitulates an algorithmic observation for ordered Depth-First Search (DFS) 
' in directed graphs that immediately leads to a parallel algorithm with linear speed-up for 

a range of processors for non-sparse graphs. The note extends the approach to ordered 
^yy | Breadth-First Search (BFS). With p processors, both DFS and BFS algorithms run in 

^1 . 0(m/p + n) time steps on a shared-memory parallel machine allowing concurrent reading 

of locations, e.g., a CREW PRAM, and have linear speed-up for p < m/n. Both algorithms 
q . need n synchronization steps. 

1 Introduction 

(N : 

£N) ' Depth- and Breadth-First Search are elementary graph traversal procedures with simple, sequen- 

tial algorithms [21 El E] - Both procedures pose problems for parallel implementation: the ordered 
Depth-First Search (DFS) problem is P-complete [3], and therefore unlikely to admit polyloga- 
rithmically fast, parallel algorithm using only polynomial resources, whereas for Breadth-First 
Search (BFS), no work-optimal, polylogarithmically fast parallel algorithm is known. This note 
re-presents the simple, work-optimal, linear time, parallel algorithm for Depth-First Search by 
' Varman and Doshi [8J (also Vishkin, personal communication), that can give linear speed-up 

for graphs that are not too sparse, and extends the basic observation to Breadth-First Search 
$_i . where an algorithm with similar properties is given. The idea is simple: instead of examing 
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arcs in the "forwards" direction, as in standard, textbook formulations of DFS and BFS [2], 
incoming, "backwards" arcs are used to eliminate arcs that are no longer relevant for the search. 
Whereas the standard algorithms have either conflicts (BFS) and/or dependencies (DFS) that 
hamper parallelization, arc elimination can be performed fully in parallel. 

Let G = (V, E) be a directed graph with n = \V\ vertices and m = \E\ arcs (directed edges). 
Vertices are assumed to be numbered consecutively, such that V = {0, ... ,n — 1}. Arcs are 
ordered pairs of vertices with {u,v),u,v 6 V denoting the arc directed from u to v. 

It will be assumed that the input graph G is given as an n-element array AD J of adjacency 
arrays. For each vertex u G V, ADJ[n].outdeg stores the out-degree of u, and the target vertex 
Vi of the ith arc (u,Vi) for < i < ADJ[n].outdeg is stored in AD J [u\. out [i]. 

Depth- and Breadth-First Search are procedures for graph traversal starting from a given 
a start vertex s E V. Both procedures assign traversal numbers to the vertices, indicating the 
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order in which they are reached. Breadth-First Search additionally computes for each vertex its 
distance (shortest path in number of traversed arcs) from the start vertex. Both procedures also 
compute the search tree, which will be represented by a parent pointer. For each vertex it G V, 
these computed values will be stored in AD J [it], traversal, ADJ[n]. distance and AD J [it], parent 
(for tt^s), respectively. 

The search procedures will modify the input graph by eliminating arcs, and both maintain 
the following invariant. 

Invariant 1 A vertex v is called visited when it has been assigned its (Depth- or Breadth-First 
Search) traversal number. Each visited vertex v G V will have no incoming arcs, that is, there 
will be no arc (it, v) for any it G V. 

In order to maintain Invariant [1] the procedures eliminate incoming arcs when a vertex is 
being visited. To do this efficiently, each vertex v £ V needs an array storing the vertices u G V 
for which there is an arc (u,v), as well as the index i of v in the array AD J [it]. out such that 
v = AD J [it], out [i]. The arrays AD J [i;]. in for each v £V shall store the pairs (it, i) representing 
the incoming arcs of v in this fashion. 

To eliminate the arc (it, v) from the adjacency array of u, links (indices) to next and previous 
non-removed vertices in the array are maintained, imposing a doubly linked list on each of the 
adjacency arrays. The operation eliminate(G, it, i) removes the ith vertex in ADJ[it].out by 
linking it out of the doubly linked list. The adjacency array itself is not changed. Next and 
previous indices are maintained in the arrays ADJ[n].next and ADJ[u].prev; AD J [it], first shall 
index the first non-eliminated vertex in AD J [it], out. 

Algorithm [1] shows how to compute the array of incoming arcs and the pointers for the 
doubly linked adjacency lists. A for-construct indicates sequential execution for all values in 
some index set (in some order), whereas the par-construct indicates that the computations for 
each element in the index set can be performed in parallel by the available processors. All 
processors are assumed to have access to the same memory, and concurrent reading is allowed. 
Synchronization is implied at the end of each par-construct. 

Algorithm 1 Computing incoming arcs for each v G V and doubly linked adjacency lists, 
par it G V do 

ADJ[it].indeg <- 0, 
ADJ[u].first <- 0, 
end par 
for it G V do 

par < i < ADJ[n].outdeg do 
v 4- AD J [u]. out [i] 
d 4- ADJ[u].indeg 

ADJkv].in[d] <— (u,i) {Add incoming arc (u,v) to v} 
ADJ[i;].indeg <- d + 1 
AD J [it], next [i] <r- i + 1 
ADJ[it].prev[i] <- i - 1 
end par 
end for 



Lemma 1 Algorithm[l\ computes the array of(u,i) vertex-index pairs representing the incoming 
arcs for all vertices v G V. It also initializes the doubly linked lists over the adjacency arrays 
ADJ[u].out. The algorithm runs in 0(m/p + n) time steps with p processors using 0(m + n) 
additional space. 
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Proof: In each sequential iteration over the set of vertices, incoming arcs are added to differ- 
ent target vertices. For each u GV this can therefore be done in parallel by the p available pro- 
cessors in 0(d(u)/p) time steps, where d(u) is the outdegree of vertex it, provided that all proces- 
sors can read the start address of the array. The total time is 0(n+J2uev d(u)/p) = 0(m/p+n). 
□ 

The ADJ[n]. first indices for each u G V will be maintained such that AD J [it], first < ADJ[-u].outdeg 
indicates a non-empty list of non-eliminated arcs out of u. The eliminate operation is straight- 
forward. 

2 Depth- First Search 

We can now present the parallel Depth-First Search algorithm. The DFS procedure is called 
with a start vertex s £ V and an initial DFS number a, and computes a DFS tree with reachable 
vertices numbered successively in DFS order starting from a. Each recursive call visits a new 
vertex, assigns it a DFS number, establishes Invariant Q] by eliminating, in parallel, all arcs into 
the vertex, and then recursively DFS numbers the subtree from the first non-eliminated arc 
(s,v) out of s; outgoing arcs are always considered in the fixed order as given in the adjacency 
array representation of G. The recursion traverses the vertices in G reachable from s in DFS 
order. The algorithm is given in detail as Algorithmic and is essentially as described by Varman 
and Doshi [8]. 

Algorithm 2 Recursive, parallel Depth-First Search from start vertex s £ V. Vertices that are 
reachable from s will be assigned successive DFS numbers starting from a. 
Procedure DFS(s,G, a): 
par < i < ADJ[s].indeg do 
(u,j) <- ADJ[s].in[t] 
eliminate(G, u,j) 
end par 

ADJ[s] .traversal «— a {Vertex s now visited} 
a ^— a + 1 

while AD J [s]. first < ADJ[s].outdeg do {As long as there are un-eliminated arcs} 

i <- AD J [s]. first 

v <- ADJ[s].out[i] 

ADJ[f]. parent <— s 

a±- DFS(v,G,a) 
end while 
return a 



Proposition 1 Algorithm computes an ordered Depth-First Search numbering and tree in 
0{m/p + n) time steps using p processors. 

Proof: By Invariant [T] once a vertex v is visited, it will never be considered again, since 
all arcs into v will have been eliminated. Therefore, each vertex in G that is reachable from s 
will be visited once. The time complexity is immediate: when a vertex is visited the incom- 
ing arcs are eliminated in parallel. Since AD J [u]. first will for each vertex be the index of the 
first adjacent vertex v where the arc (u, v) has not been eliminated, the order in which vertices 
are visited is the same as standard DFS search procedures, from which the correctness follows. □ 
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Figure 1: A sample graph G = (V,E) and the DFS traversal as per Algorithm [2] starting 
from the topmost node. Arcs are examined in counter-clockwise order, starting from lower left. 
Node labels are the DFS numbers, and tree edges are indicated as heavy, undirected edges. 
Arcs disappear as they are being eliminated, leaving at the end the heavy DFS tree. 

The algorithm also computes a DFS tree by setting parent pointers for the visited vertices. 
Note that it can easily be extended to classify arcs into backwards, forwards, tree and cross arcs, 
as sometimes desirable by a DFS traversal, without changing the time bounds. An example 
execution of the algorithm is given in Figure [TJ 

3 Breadth-First Search 

The arc elimination idea can also be used for parallel Breadth-First Search, as shown in Algo- 
rithm [3j The algorithm has the same structure as standard, "forwards" BFS [2], but performs 
parallel arc elimination as new, unexplored vertices are added to the queue for the next level. 
This ensures that each reachable vertex is explored once. 

Proposition 2 Algorithmic computes an ordered Breadth-First Search numbering and tree in 
0{m/p + n) time steps using p processors. 

Proof: Again by Invariant [TJ once a vertex has been visited it will newer be considered again, 
and therefore arc elimination is performed once for each reachable vertex. From this the time 
bound follows. For each vertex in Q for some level, the un-eliminated arcs are considered in 
order determined by the representation of G, and as each unvisited vertex is put into the queue 
Q' for the next level, all incoming arc are eliminated. This in particular ensures that there are 
no arcs between vertices in Q' . As for standard BFS, all vertices in Q before the start of an 
iteration of the innermost repeat loop have the same distance to the source vertex, from which 
correctness follows. □ 

It is especially worth noticing that there are no arcs between nodes in Q' , the queue being 
filled for the next iteration. An example execution of the algorithm is given in Figure [2j The 
important property of BFS is that vertices are explored in least recently visited order; it is 
therefore, also in the arc elimination algorithm, possible to dispense with the explicit next level 
queue Q' and do with only a single repeat-loop [7]. 
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Algorithm 3 Parallel Breadth-First Search from start vertex s € Q. Vertices that are reachable 
from s will be assigned a BFS number starting from a; also the distance from s (in smallest 
number of arcs) will be computed. 
Procedure BFS(s, G, a): 
par < i < ADJ[s].indeg do 
(u,j) <- ADJ[s].in[i] 
eliminate(G, u,j) 
end par 
I <- 

AD J [s]. traversal «— a 

AD J [u]. distance <- / 

Q.enque(s) {Start vertex s visited} 

Q' ^0 

repeat 

+ l {Next level} 
repeat 

u «— Q.deque() 

while ADJ[u]. first < ADJ [u].outdeg do {As long as there are un-eliminated arcs} 
i <- AD J [u]. first 
v <- AD J [u]. out [i] 
par < j < ADJ[u].indeg do 
(w,k) <- AB3[v].\n[j] 
eliminate^, w, k) 
end par 

a a + 1 {Next vertex} 
ADJ[u] .traversal <— a 
AD J [v]. distance «— / 
ADJ[u]. parent «— u 

Q'.enque(v ) {Vertex v has now been visited, enque for next level} 
end while 
until Q = 

until Q = 
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Figure 2: The sample graph G = (V, E) and the BFS traversal as per Algorithm [3] starting 
from the topmost node. Arcs are examined in counter-clockwise order, starting from lower left. 
Node labels are the computed BFS numbers, and tree edges are indicated as heavy, undirected 
edges. Arcs disappear as they are being eliminated, leaving at the end the heavy BFS tree. 

4 Discussion 

The time bounds for both Depth- and Breadth-First Search algorithms guarantee linear speed- 
up when p < m/n, or equivalently m > pn; that is, good speed-up is possible for graphs 
with average degree larger than the number of processors. The algorithms presented here 
are complementary to the standard, textbook, "forwards" procedures for DFS and BFS [2J. 
Standard DFS where arcs are examined only in the forwards direction has no parallelism; in 
contrast, the algorithm given here can perform the arc elimination fully in parallel. Typical, 
parallel Breadth-First Search algorithms exploit parallelism mostly by considering active vertices 
in the queue for each level in parallel, see, e.g., [HEJE]. Although the forward edges also be 
can be explored in parallel, compaction or other data structure operations are necessary for 
resolving/ avoiding update conflicts and maintaining the queue for the next level. The parallel 
running time of such algorithms will typically be bounded by the diameter of the graph, but 
either at the cost of more work incurred by data structure operations, or by requiring stronger, 
atomic operations. The arc elimination approach does not require either of these means (data 
structures, compaction, atomic operations), and has exploitable parallelism independent of the 
BFS structure of the graphs, as long as the total number of edges m satisfies m > np. 
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